18 research outputs found

    The Stable Signature: Rooting Watermarks in Latent Diffusion Models

    Full text link
    Generative image modeling enables a wide range of applications but raises ethical concerns about responsible deployment. This paper introduces an active strategy combining image watermarking and Latent Diffusion Models. The goal is for all generated images to conceal an invisible watermark allowing for future detection and/or identification. The method quickly fine-tunes the latent decoder of the image generator, conditioned on a binary signature. A pre-trained watermark extractor recovers the hidden signature from any generated image and a statistical test then determines whether it comes from the generative model. We evaluate the invisibility and robustness of the watermarks on a variety of generation tasks, showing that Stable Signature works even after the images are modified. For instance, it detects the origin of an image generated from a text prompt, then cropped to keep 10%10\% of the content, with 9090+%\% accuracy at a false positive rate below 10āˆ’6^{-6}.Comment: Website at https://pierrefdz.github.io/publications/stablesignatur

    Superfilamentation in air

    Full text link
    The interaction between a large number of laser filaments brought together using weak external focusing leads to the emergence of few filamentary structures reminiscent of standard filaments, but carrying a higher intensity. The resulting plasma is measured to be one order of magnitude denser than for short-scale filaments. This new propagation regime is dubbed superfilamentation. Numerical simulations of a nonlinear envelope equation provide good agreement with experiments.Comment: 5 pages, 4 figure

    Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

    Full text link
    Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further align the network with the intended usage. Yet the imperfections in the proxy reward may hinder the training and lead to suboptimal results; the diversity of objectives in real-world tasks and human opinions exacerbate the issue. This paper proposes embracing the heterogeneity of diverse rewards by following a multi-policy strategy. Rather than focusing on a single a priori reward, we aim for Pareto-optimal generalization across the entire space of preferences. To this end, we propose rewarded soup, first specializing multiple networks independently (one for each proxy reward) and then interpolating their weights linearly. This succeeds empirically because we show that the weights remain linearly connected when fine-tuned on diverse rewards from a shared pre-trained initialization. We demonstrate the effectiveness of our approach for text-to-text (summarization, Q&A, helpful assistant, review), text-image (image captioning, text-to-image generation, visual grounding, VQA), and control (locomotion) tasks. We hope to enhance the alignment of deep models, and how they interact with the world in all its diversity

    Ɖdition sĆ©mantique dā€™images Ć  partir de requĆŖtes textuelles

    No full text
    Lā€™objectif de cette theĢ€se est de proposer des algorithmes pour la taĢ‚che dā€™eĢdition dā€™images baseĢe sur le texte (TIE), qui consiste aĢ€ eĢditer des images numeĢriques selon une instruction formuleĢe en langage naturel. Par exemple, eĢtant donneĢ une image dā€™un chien et la requeĢ‚te "Changez le chien en un chat", nous voulons produire une nouvelle image ouĢ€ le chien a eĢteĢ remplaceĢ par un chat, en gardant tous les autres aspects de lā€™image inchangeĢs (couleur et pose de lā€™animal, arrieĢ€re- plan). Lā€™objectif de lā€™eĢtoile du nord est de permettre aĢ€ tout un chacun de modifier ses images en utilisant uniquement des requeĢ‚tes en langage naturel. Une des speĢcificiteĢs de lā€™eĢdition dā€™images baseĢe sur du texte est quā€™il nā€™y a pratiquement pas de donneĢes dā€™entraiĢ‚nement pour former un algorithme superviseĢ. Dans cette theĢ€se, nous proposons diffeĢrentes solutions pour lā€™eĢdition dā€™images, baseĢes sur lā€™adaptation de grands modeĢ€les multimodaux entraiĢ‚neĢs sur dā€™eĢnormes ensembles de donneĢes. Nous eĢtudions tout dā€™abord une configuration dā€™eĢdition simplifieĢe, appeleĢe eĢdition dā€™image baseĢe sur la recherche, qui ne neĢcessite pas de modifier directement lā€™image dā€™entreĢe. Au lieu de cela, eĢtant donneĢ lā€™image et la requeĢ‚te de modification, nous recherchons dans une grande base de donneĢes une image qui correspond aĢ€ la modification demandeĢe. Nous nous appuyons sur des modeĢ€les multimodaux dā€™alignement image/texte entraiĢ‚neĢs sur des ensembles de donneĢes aĢ€ lā€™eĢchelle du web (comme CLIP) pour effectuer de telles transformations sans aucun exemple. Nous proposons eĢgalement le cadre SIMAT pour eĢvaluer lā€™eĢdition dā€™images baseĢe sur la recherche. Nous eĢtudions ensuite comment modifier directement lā€™image dā€™entreĢe. Nous proposons FlexIT, une meĢthode qui modifie iteĢrativement lā€™image dā€™entreĢe jus- quā€™aĢ€ ce quā€™elle satisfasse un "objectif dā€™eĢdition" abstrait deĢfini dans un espace dā€™inteĢgration multimodal. Nous introduisons des termes de reĢgularisation pour imposer des transformations reĢalistes. Ensuite, nous nous concentrons sur les modeĢ€les de diffusion, qui sont des modeĢ€les geĢneĢratifs puissants capables de syntheĢtiser de nouvelles images conditionneĢes par une grande varieĢteĢ dā€™invites textuelles. Nous deĢmontrons leur polyvalence en proposant DiffEdit, un algorithme qui adapte les modeĢ€les de diffusion pour lā€™eĢdition dā€™images sans reĢglage fin. Nous proposons une strateĢgie "zero-shot" pour trouver automatiquement ouĢ€ lā€™image initiale doit eĢ‚tre modifieĢe pour satisfaire la requeĢ‚te de transformation de texte.The aim of this thesis is to propose algorithms for the task of Text-based Image Editing (TIE), which consists in editing digital images according to an instruction formulated in natural language. For instance, given an image of a dog, and the query "Change the dog into a cat", we want to produce a novel image where the dog has been replaced by a cat, keeping all other image aspects unchanged (animal color and pose, background). The north-star goal is to enable anyone to edit their images using only queries in natural language. One specificity of text-based image editing is that there is practically no training data to train a supervised algorithm. In this thesis, we propose different solutions for editing images, based on the adaptation of large multimodal models trained on huge datasets. We first study a simplified editing setup, named Retrieval-based image edit- ing, which does not require to directly modify the input image. Instead, given the image and modification query, we search in a large database an image that corresponds to the requested edit. We leverage multimodal image/text alignment models trained on web-scale datasets (like CLIP) to perform such transformations without any examples. We also propose the SIMAT framework for evaluating retrieval-based image editing. We then study how to directly modify the input image. We propose FlexIT, a method which iteratively changes the input image until it satisfies an abstract "editing objective" defined in a multimodal embedding space. We introduce a variety of regularization terms to enforce realistic transformations. Next, we focus on diffusion models, which are powerful generative models able to synthetize novel images conditioned on a wide variety of textual prompts. We demonstrate their versatility by proposing DiffEdit, an algorithm which adapts diffusion models for image editing without finetuning. We propose a zero-shot strategy for finding automatically where the initial image should be changed to satisfy the text transformation query. Finally, we study a specific challenge useful in the context of image editing: how to synthetize a novel image by giving as constraint a spatial layout of objects with textual descriptions, a task which is known as Semantic Image Synthesis. We adopt the same strategy, consisting in adapting diffusion models to solve the task without any example. We propose the ZestGuide algorithm, which leverages the spatio-semantic information encoded in the attention layers of diffusion models

    Ɖdition sĆ©mantique dā€™images Ć  partir de requĆŖtes textuelles

    No full text
    The aim of this thesis is to propose algorithms for the task of Text-based Image Editing (TIE), which consists in editing digital images according to an instruction formulated in natural language. For instance, given an image of a dog, and the query "Change the dog into a cat", we want to produce a novel image where the dog has been replaced by a cat, keeping all other image aspects unchanged (animal color and pose, background). The north-star goal is to enable anyone to edit their images using only queries in natural language. One specificity of text-based image editing is that there is practically no training data to train a supervised algorithm. In this thesis, we propose different solutions for editing images, based on the adaptation of large multimodal models trained on huge datasets. We first study a simplified editing setup, named Retrieval-based image edit- ing, which does not require to directly modify the input image. Instead, given the image and modification query, we search in a large database an image that corresponds to the requested edit. We leverage multimodal image/text alignment models trained on web-scale datasets (like CLIP) to perform such transformations without any examples. We also propose the SIMAT framework for evaluating retrieval-based image editing. We then study how to directly modify the input image. We propose FlexIT, a method which iteratively changes the input image until it satisfies an abstract "editing objective" defined in a multimodal embedding space. We introduce a variety of regularization terms to enforce realistic transformations. Next, we focus on diffusion models, which are powerful generative models able to synthetize novel images conditioned on a wide variety of textual prompts. We demonstrate their versatility by proposing DiffEdit, an algorithm which adapts diffusion models for image editing without finetuning. We propose a zero-shot strategy for finding automatically where the initial image should be changed to satisfy the text transformation query. Finally, we study a specific challenge useful in the context of image editing: how to synthetize a novel image by giving as constraint a spatial layout of objects with textual descriptions, a task which is known as Semantic Image Synthesis. We adopt the same strategy, consisting in adapting diffusion models to solve the task without any example. We propose the ZestGuide algorithm, which leverages the spatio-semantic information encoded in the attention layers of diffusion models.Lā€™objectif de cette theĢ€se est de proposer des algorithmes pour la taĢ‚che dā€™eĢdition dā€™images baseĢe sur le texte (TIE), qui consiste aĢ€ eĢditer des images numeĢriques selon une instruction formuleĢe en langage naturel. Par exemple, eĢtant donneĢ une image dā€™un chien et la requeĢ‚te "Changez le chien en un chat", nous voulons produire une nouvelle image ouĢ€ le chien a eĢteĢ remplaceĢ par un chat, en gardant tous les autres aspects de lā€™image inchangeĢs (couleur et pose de lā€™animal, arrieĢ€re- plan). Lā€™objectif de lā€™eĢtoile du nord est de permettre aĢ€ tout un chacun de modifier ses images en utilisant uniquement des requeĢ‚tes en langage naturel. Une des speĢcificiteĢs de lā€™eĢdition dā€™images baseĢe sur du texte est quā€™il nā€™y a pratiquement pas de donneĢes dā€™entraiĢ‚nement pour former un algorithme superviseĢ. Dans cette theĢ€se, nous proposons diffeĢrentes solutions pour lā€™eĢdition dā€™images, baseĢes sur lā€™adaptation de grands modeĢ€les multimodaux entraiĢ‚neĢs sur dā€™eĢnormes ensembles de donneĢes. Nous eĢtudions tout dā€™abord une configuration dā€™eĢdition simplifieĢe, appeleĢe eĢdition dā€™image baseĢe sur la recherche, qui ne neĢcessite pas de modifier directement lā€™image dā€™entreĢe. Au lieu de cela, eĢtant donneĢ lā€™image et la requeĢ‚te de modification, nous recherchons dans une grande base de donneĢes une image qui correspond aĢ€ la modification demandeĢe. Nous nous appuyons sur des modeĢ€les multimodaux dā€™alignement image/texte entraiĢ‚neĢs sur des ensembles de donneĢes aĢ€ lā€™eĢchelle du web (comme CLIP) pour effectuer de telles transformations sans aucun exemple. Nous proposons eĢgalement le cadre SIMAT pour eĢvaluer lā€™eĢdition dā€™images baseĢe sur la recherche. Nous eĢtudions ensuite comment modifier directement lā€™image dā€™entreĢe. Nous proposons FlexIT, une meĢthode qui modifie iteĢrativement lā€™image dā€™entreĢe jus- quā€™aĢ€ ce quā€™elle satisfasse un "objectif dā€™eĢdition" abstrait deĢfini dans un espace dā€™inteĢgration multimodal. Nous introduisons des termes de reĢgularisation pour imposer des transformations reĢalistes. Ensuite, nous nous concentrons sur les modeĢ€les de diffusion, qui sont des modeĢ€les geĢneĢratifs puissants capables de syntheĢtiser de nouvelles images conditionneĢes par une grande varieĢteĢ dā€™invites textuelles. Nous deĢmontrons leur polyvalence en proposant DiffEdit, un algorithme qui adapte les modeĢ€les de diffusion pour lā€™eĢdition dā€™images sans reĢglage fin. Nous proposons une strateĢgie "zero-shot" pour trouver automatiquement ouĢ€ lā€™image initiale doit eĢ‚tre modifieĢe pour satisfaire la requeĢ‚te de transformation de texte

    Functional invariants to watermark large transformers

    No full text
    International audienceThe rapid growth of transformer-based models increases the concerns about their integrity and ownership insurance. Watermarking addresses this issue by embedding a unique identifier into the model, while preserving its performance. However, most existing approaches require to optimize the weights to imprint the watermark signal, which is not suitable at scale due to the computational cost. This paper explores watermarks with virtually no computational cost, applicable to a non-blind white-box setting (assuming access to both the original and watermarked networks). They generate functionally equivalent copies by leveraging the modelsā€™ invariance, via operations like dimension permutations or scaling/unscaling. This enables to watermark models without any change in their outputs and remains stealthy. Experiments demonstrate the effectiveness of the approach and its robustness against various model transformations (fine-tuning, quantization, pruning), making it a practical solution to protect the integrity of large models

    Generation of long-lived underdense channels using femtosecond filamentation in air

    No full text
    International audienceUsing femtosecond laser pulses at 800 and 400 nm, we characterize the formation of underdense channels in air generated by laser filamentation at the millijoule energy level by means of transverse interferometry. We find that using tight focusing conditions, filamentation generates a shock wave and that the resulting low-density channel lasts for more than 90 ms. Comparison of these results with hydrodynamic simulations using an Eulerian hydrodynamic code gives an good agreement and allows us to estimate the initial gas peak temperature at āˆ¼ 1000 K. The influence of experimental parameters such as the focusing conditions for the ultrashort laser pulse, its polarization or the wavelength is studied and linked to previous characterizations of filamentation-generated plasma columns
    corecore